Semi-automatic Approach to Building Dictionary between Slavonic Languages
نویسنده
چکیده
Machine translation between Slavonic languages is still in its early stages. Existence of bilingual dictionaries have big impact on quality of translation. Unfortunately creating such language resources is quite expensive. For small languages like Czech, Slovak or Slovenian is almost sure that large-enough dictionary will not be commercially successful. Slavonic languages tends to range between close and very close languages so it is possible to infer some translation pairs. Our presentation focus on describing semi-automatic approach using ‘cheap’ resources for CzechSlovak and Serbian-Slovenian dictionary. These resources are stacked so in earlier phases we will receive results of higher precision. Our results show that this approach improves effectivity of building dictionaries for close languages. Petr Sojka, Aleš Horák (Eds.): Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2008, pp. 10–10, 2008. c ©Masaryk University, Brno 2008
منابع مشابه
Robust Ending Guessing Rules With Application To Slavonic Languages
The paper studies the automatic extraction of diagnostic word endings for Slavonic languages aimed to determine some grammatical, morphological and semantic properties of the underlying word. In particular, ending guessing rules are being learned from a large morphological dictionary of Bulgarian in order to predict POS, gender, number, article and semantics. A simple exact high accuracy algori...
متن کاملExtracting Translation Lexicons from Bilingual Corpora: Application to South-Slavonic Languages
The paper presents a novel approach for automatic translation lexicon extraction from a parallel sentence-aligned corpus. This is a five-step process, which includes cognate extraction, word alignment, phrase extraction, statistical phrase filtering, and linguistic phrase filtering. Unlike other approaches whose objective is to extract word or phrase pairs to be used in machine translation, we ...
متن کاملSpelling-checking for Highly Inflective Languages
Spelling-checkers have become an integral part of most text processing software. From different reasons among which the speed of processing prevails they are usually based on dictionaries of word forms instead of words. This approach is sufficient for languages with little inflection such as English, but fails for highly inflective languages such as Czech, Russian, Slovak or other Slavonic lang...
متن کاملSemi-Automatic Extension of Sanskrit Wordnet using Bilingual Dictionary
In this paper, we report our methods and results of using, for the first time, semi-automatic approach to enhance an Indian language Wordnet. We apply our methods to enhancing an already existing Sanskrit Wordnet created from Hindi Wordnet (which is created from Princeton Wordnet) using expansion approach. We base our experiment on an existing bilingual Sanskrit English Dictionary and show how ...
متن کاملRsdnet: a Web-based Collaborative Framework for Building Multilingual Semantic Networks
We present a system (RSDnet) that allows non-expert Web users to contribute towards building a multilingual lexical resource. Our study focuses on the Romanian-English language pair, and the target resource is a Romanian WordNet strongly connected to the English WordNet. We use a bilingual dictionary, a monolingual definition dictionary and documents on the Web to build synsets, attach them a g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008